Veterbi Algorithm

Description

Given Observations: $$ O = \{O_1, \dots, O_T \} $$ Find best corresponding state sequence: $$ Q = \{Q_1, \dots, Q_T \} $$

Preliminary

Let $\delta_t$ be the probability of ending up in $ i $ given previous observations. $$ \begin{aligned} \delta_t(i) &\triangleq \max_{Q_1 \cdots Q_{t-1}} P[Q_1 \cdots Q_{t-1}Q_t = S_i, O_1 \cdots O_t | \lambda] \end{aligned} $$

Here $ \delta_t(i) $ is the best best score along a single path at time $ t $ that ends in sate $ S_i $.

By induction: $$ \begin{aligned} \delta_{t+1}(j) &= \max_{Q_1 \cdots Q_t} P[Q_1 \cdots Q_{t+1} = S_j, O_1 \cdots O_{t+1}| \lambda] \\ &= P[O_{t+1}|Q_{t+1} = S_j, \lambda] \cdot \max_{Q_1 \cdots Q_t} P[Q_{t+1} = S_j | Q_t = S_i, \lambda] \cdot P[Q_1 \cdots Q_{t} = S_i, O_1 \cdots O_{t}| \lambda] \\ &= [\max_{i} \delta_{t}(i) \cdot a_{ij}] \cdot b_{j}(O_{t+1}) \end{aligned} $$

Procedure

($ \psi $ is for memorization, the best state to choose.)

Initialization: $$ \begin{aligned} \delta_{1}(i) &= \pi_i \cdot b_{i}(O_1) \\ \psi_1(i) &= 0 \end{aligned} $$
Recursion: $$ \begin{aligned} \delta_{t+1}(j) &= \max_{i} [\delta_{t}(i) \cdot a_{ij}] \cdot b_j(O_{t+1}) \\ \psi_{t+1}(j) &= \arg\max_{i} [\delta_{t}(i) \cdot a_{ij}] \end{aligned} $$
Termination: $$ \begin{aligned} Q_{T}^{*} &= \arg\max_{i} [\delta_{T}(i)] \\ P_{T}^{*} &= \max_{i} [\delta_{T}(i)] \end{aligned} $$
Path Backtracking: $$ \begin{aligned} Q_{t - 1}^{*} = \psi_{t}(Q_{t}^{*}) \end{aligned} $$

Algorithm in Theory

$ \underline{\delta_0} $
$\forall t \in [1, T]$:
- $ \forall j \in [1, N] $:
  - $ \delta_{t}(j) = \max_{i} [\delta_{t-1}(i) \cdot a_{ij}] \cdot b_j(O_{t}) $

Algorithm in Practice

$ \underline{f_0} $
$ \forall t \in [1, T] $
- $ \forall j \in [1, N] $:
  - $ f_t(j) = 0 $
  - $ \forall i \in [1, N] $:
    - $ f_{t}(j) = \max(f_{t-1}(i) \cdot a_{ij}, f_t(j)) $
  - $ f_t(j) = f_t(j) \cdot b_j(O_{t+1}) $

Complexity: $ \Theta(N^2 \cdot T) $. See that forwarding every layer requires $ N^2 $ operations.